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1 Problem 

We use the abbreviation MDS for the minimization of Kruskal’s stress 

( 1 ) 

l<£<j<n 

over the n x p configurations X (Kruskal (1964a), Kruskal (1964b)). Symmetric matrix 
W = {wij} has known non-negative weights , symmetric matrix A = {<%} has non-negative 
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dissimilarities, and the symmetric matrix-valued function D(X) = {dij(X)} has Euclidean 
distances between the points , i.e. the rows of X. 

In the usual applications of MDS the user chooses dimensionality p to be much smaller than 
n, because the name of the game is dimensionality reduction. In this paper we give a detailed 
analysis of full dimensional scaling or FDS, which is MDS with p = n. 

2 Notation 

We introduce some notation which is useful for all Euclidean MDS problems, not just FDS 
problems. This notation has been more or less standard since De Leeuw (1977). Suppose 
the e t are unit vectors of length n, i.e. e t has all its elements equal to zero except element i, 
which is equal to one. Also define the matrices A tJ := (e* — ej)(e t — e^)'. Thus Aij has four 
nonzero elements: elements (i,i) and (j, j) are +1, elements (i,j) and (j,i) are —1. With 
these definitions 

dl{X) = (, Si - e j )'XX\e i - ej ) = tr X'A {j X = tr A^XX'. (2) 

Now expand stress. If we assume without loss of generality that 

X E w v 5 ij = >• 

and we define 


P(X):= EEVi Ai(X), (3) 

v( x ) ■■= w n d l- ( 4 ) 

Now 

a(X) = 1 - p(X) + tr,(X). (5) 

To develop some convenient matrix notation we define 


v '■= EE w H A Hi 

l<i<j<n 

in- ■ Hii A ■ ■ 
W Vd t j(.Y)W7 

o 


B(X) ■■= EE ( 


if dij{X) > 0, 
if dij(X) = 0, 


( 6 ) 

( 7 ) 
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so that 


p{X) = tr X'B(X)X, (8) 

r](X) = tr X'VX. (9) 


Note that V has off-diagonal elements equal to —Wij and B(X) has off-diagonal elements 


bij(X) 


—w. 
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Sjj 

dij(X) 


if dij(X) > 0 and zero otherwise. Diagonal elements are filled to make rows and columns 
sum to zero. Both V and B(X) are doubly-centered and positive semi-definite. If we assume 
without loss of generality that the MDS problem is irredicible (De Leeuw (1977)) then 
rank(D) = n — 1 and the only vectors in the null space are multiples of e := (1,1, • • • , 1). If 
all weights are eual to one then V — nl — ee!. 


3 Local minima of stress 

De Leeuw (1984) proves a key necessary condition for a local minimum in MDS. Also see De 
Leeuw, Groenen, and Mair (2016) for a more recent proof. We also review some additional 
results on maxima and minima of stress. 

Theorem 1: [De Leeuw 84, De Leeuw 93, De Leeuw and Groenen 97] 

1. If a has a local minimum at X then dij(X) > 0 for all j such that WijSij > 0. 

2. If a has a local minimum at X then a is differentiable in a neighborhood of X. Thus 
B(X)X = VX. 

3. If o has a local minimum at a column-centered X and rank(A^) = n — 1 then cr(X) = 0. 

4. cr has a local maximum at X if and only if X = 0. 

Let’s look at a small example with four points, all dissimilarities equal, and all weights equal 
to one. There is a local minimum with four points in the corners of a square, with stress equal 
to 0.0285955. And there is another local minimum with three points forming an equilateral 
triangle, and the fourth point in the center. This has stress 0.0669873. We can compute 
stress for all points of the form aX + f3Y, where X and Y are the two local minima. Figure 
1 has a contour plot of cr(a,/3), showing the local minima at (1,0) and (0,1), and the local 
maximum at (0,0). 
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-0.5 0.0 0.5 1.0 1.5 


Figure 1: Plane spanned by two local minima, equal dissimilarities 

Alternatively, we can plot stress on the line connecting X and Y. Note that although stress 
only has a local maximum at the origin in configuration space, it can very well have local 
maxima if restricted to lines. 
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Figure 2: Line connecting two local minima, equal dissimilarities 


4 Cross Product Space 

So far we have formulated the MDS problem in configuration space. Stress is a function of A", 
the n x p configuration matrix. We now consider an alternative formulation, where stress is 
a function of a positive semi-definite C or order n. The relevant definitions are 

a(C):=l-2p(C) + V (C), (10) 

where 


p(C) := tr B(C)C, 
r)(C) := tr VC, 

with 
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B(C) := EE 

l<i<j<n 


W. 


ij A ■ ■ 
l 3 dij (C) ^ % 3 


if dij(C) > 0, 
if d tJ (C) = 0. 


and dy(C) := tr AijC. 

We call the space of all positive semi-definite n x n matrices cross product space. The problem 
of minimizing o over n x p-dimensional configuration space is equivalent to the problem 
of minimizing o over the set of matrices C in n x n-dimensional cross product space that 
have rank less than or equal to p. The corresponding solutions are related by the simple 
relationship C = XX'. 

Theorem 2: [De Leeuw 93] Stress is convex on cross product space. 

Proof: First, r) is linear in C. Second, 

p(C ) = EE WijSijy/ tr AijC. 

l<i<ji<n 

This is the weighted sum of square roots of non-negative functions that are linear in C, and 
it is consequently concave. Thus a is convex. QED 

Unfortunately the subset of cross product space of all matrices with rank less than or equal 
to p is far from simple (see Datorro (2015)), so computational approaches to MDS prefer to 
work in configuration space. 


5 Full-dimensional Scaling 

Cross product space, the set of all positive semi-definite matrices, is a closed convex cone 1C 
in the linear space of all n x n symmetric matrices. This has an interesting consequence. 

Theorem 3: [De Leeuw 93] Full-dimensional scaling, i.e. minimizing a over 1C, is a convex 
programming problem. Thus in FDS all local minima are global. If WijSij > 0 for all i,j then 
the minimum is unique. 

This result has been around since about 1985. De Leeuw (1993) gives a proof, but the report 
it appeared in remained unpublished. A published proof is in De Leeuw and Groenen (1997). 
Another treatment of FDS, with a somewhat different emphasis, is in De Leeuw (2014). 

Now, by a familiar theorem (Theorem 31.4 in Rockafcllar (1970)), a matrix C minimizes cr 
over 1C if and only if 


C e 1C, 

(ii) 

V — B(C) G 1C, 

(12) 

tr C(V-B(C)) =0. 

(13) 


We give a computational proof of this result for FDS that actually yields a bit more. 
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Theorem 4: [Expansion] For A G /C we have 


cr(C + eA) = cr(C) — 2e 2 ^ Wi8i\Jtr A + e tr (V — B(C))A + o(e). (14) 

tr 


Proof: Simple expansion. QED 

Theorem 5: [Convex]: Suppose C is a solution to the problem of minimizing a over 1C. 
Then 

1. tr AijC > 0 for all i,j for which WijSij > 0. 

2. V — B{C ) is positive semi-definite. 

3. tr C(V-B(C)) = 0. 

4. If C is positive definite then V = B(C ) and cr(C) = 0. 

Proof: The term in (14) needs to vanish at a local minimum. This proves the first part. 
It follows that at a local minimum 


a(C + eA) = cr(C) + e tr (V - B(C)) A + o(e). 

If V — B(C) is not positive semi-definite, then there is a A G /C such that tr (V — B(C ))A < 0. 
Thus C cannot be the minimum, which proves the second part. If we choose A = C we find 

cr((l + e)C) = <t{C) + e tr (V - B(C))C + o(e). 

and choosing e small and negative shows we must have tr (V — B(C))C = 0 for C to be a 
minimum. This proves the third part. Finally, if a has a minimum at C, and C is positive 
definite, then from parts 2 and 3 we have V = B(C). Comparing off-diagonal elements shows 
A = -D(C), and thus cr(C) = 0. QED 

If C is the solution of the FDS problem, then rank(C') defines the Gower rank of the 
dissimilarities. The number of positive eigenvalues of the negative of the doubly-centered 
matrix of squared dissimilarities, the matrix factored in classical MDS, defines the Torgerson 
rank of the dissimilarities. The Gower conjecture is that the Gower rank is less than or equal 
to the Torgerson rank. No proof and no counter examples have been found. 

We compute the FDS solution using the srnacof algorithm 

A (fc+1) = V + B(X (k) ) (15) 

in the space of all n x n configurations, using the identity matrix as a default starting point. 
Since we work in configuration space, not in crossproduct space, this does not guarantee 
convergence to the unique FDS solution, but after convergence we can easily check the 
necessary and sufficient conditions of theorem 5. 
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As a small example, consider four points with all dissimilarities equal to one, except S 14 which 
is equal to three. Clearly the triangle inequality is violated, and thus there certainly is no 
perfect fit mapping into Euclidean space. 

The FDS solution turns out to have rank two, thus the Gower rank is two. The singular 
values of the FDS solution are 

## [1] 0.4508464709 0.2125310645 0.0000001303 


Gower rank two also follows from the eigenvalues of the matrix B(C), which are 
## [1] 1.0000000000 1.0000000000 0.9205543464 

6 Another planar example 

Suppose X and Y are two linearly independent configurations and consider all configurations 
Z = aX + /3Y in the plane spanned by A" and Y . Define 


B(l) := EE < 

w- 5ij 

dijiaX+pY) 

tr X'AijX 
tr Y'AijX 

tr X'AijY 
tr Y'AijY 

l<i<7<n 

0 




if dij{aX + (3Y) > 0, 
if dij(aX + f3Y) = 0, 


where 7 : = 


a 

P 


. Also 


tr X'VX tr X'VY 
tr Y'VX tr Y'VY 


Then 


< 7 ( 7 ) := 1 — 27 ' 5 ( 7)7 + y't/y, (16) 

and the stationary equations are 

5 ( 7)7 = U 7 . 

We have already seen an example of o on what can be called combination space in figure 1, 
where X and Y are two local minima. Another interesting example uses a configuration X 
and its gradient Y = T>a{X). In that case finding the minimum of a over combination space 
means finding the optimal step size in the steepest descent gradient method. 

For a configuration with three points equally spaced on a horizontal fine and the fourth point 
above the middle of the three 



## C,13 [,2] 

## [1J -0.1006209 -0.1006209 
## [2,] 0.0000000 -0.1006209 

## [3,] 0.1006209 -0.1006209 

## [4,] 0.0000000 0.3018627 


we have gradient 

[, 2 ] 

0.08772788 
0.07804082 
0.08772788 
0.25349657 


## [, 1 ] 

## 1 0.3250910 - 
## 2 0.3638044 - 
## 3 -0.6888953 - 
## 4 0.0000000 


The contour plot for the linear combinations is 



Figure 3: Plane spanned by configuration and gradient, equal dissimilarities 

Note that the plot shows various ridges, where stress is not differentiable because some of the 
distances in the linear combination are zero. 
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in this case, where we are interested in step size, we should look at linear combinations with 
a = 1 and (3 < 0, i.e. of the form A" — \f3\Va{X). Figure Figure 4 shows the initial descent 
and an optimal step size around .3. 



Figure 4: Along the path of steepest descent, equal dissimilarities 


7 Ekman example 


The Ekman (1954) color data give similarities between 14 colors. 


## 434 445 465 472 490 504 537 555 584 600 610 628 651 

## 445 0.86 

## 465 0.42 0.50 

## 472 0.42 0.44 0.81 

## 490 0.18 0.22 0.47 0.54 

## 504 0.06 0.09 0.17 0.25 0.61 

## 537 0.07 0.07 0.10 0.10 0.31 0.62 

## 555 0.04 0.07 0.08 0.09 0.26 0.45 0.73 

## 584 0.02 0.02 0.02 0.02 0.07 0.14 0.22 0.33 
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## 600 0.07 0.04 0.01 0.01 0.02 0.08 0.14 0.19 0.58 

## 610 0.09 0.07 0.02 0.00 0.02 0.02 0.05 0.04 0.37 0.74 

## 628 0.12 0.11 0.01 0.01 0.01 0.02 0.02 0.03 0.27 0.50 0.76 

## 651 0.13 0.13 0.05 0.02 0.02 0.02 0.02 0.02 0.20 0.41 0.62 0.85 

## 674 0.16 0.14 0.03 0.04 0.00 0.01 0.00 0.02 0.23 0.28 0.55 0.68 0.76 


We use three different transformations of the similarities to dissimilarities. The first is 1 — x, 
the second (1 — x) 3 and the third \/l — x. We need the following iterations to find the FDS 
solution (up to a change in loss of le-15). 


## 

power = 

1.00 

itel = 

6936 

stress = 

0.0000875293 

## 

power = 

3.00 

itel = 

171 

stress = 

0.0110248119 

## 

power = 

0.33 

itel = 

423 

stress = 

0.0000000000 


For the same three solutions we compute singular values of the thirteen-dimensional FDS 
solution. 


## [ 1 ] 
## [ 6 ] 
## [ 11 ] 
## 

## [ 1 ] 
## [ 6 ] 
## [ 11 ] 
## 

## [ 1 ] 
## [ 6 ] 
## [ 11 ] 


0.1797609824 

0.0393576522 

0.0000000009 

0.2159661347 

0.0000000000 

0.0000000000 

0.1336126813 

0.0664988952 

0.0468628701 


0.1454675297 

0.0236290817 

0.0000000000 

0.1549184093 

0.0000000000 

0.0000000000 

0.1139019875 

0.0561005006 

0.0410193579 


0.0843865491 

0.0162344515 

0.0000000000 

0.0000000727 

0.0000000000 

0.0000000000 

0.0880453752 

0.0535112029 

0.0388896490 


0.0777136109 

0.0072756171 


0.0000000041 

0.0000000000 


0.0851609618 

0.0492295395 


0.0486123551 

0.0000031164 


0.0000000000 

0.0000000000 


0.0710424935 

0.0479964575 


Thus the Gower ranks of the transformed dissimilarities are, repectively, nine (or ten), two, 
and thirteen. Note that for the second set of dissimilarities, with Gower rank two, the first 
two principal components of the thirteen-dimensional solution are the global minimizer in two 
dimensions. To illustrate the Gower rank in yet another way we give the thirteen non-zero 
eigenvalues of V + B(X), so that the Gower rank is the number of eigenvalues equal to one. 
All three solutions satisfy the necessary and sufficient conditions for a global FDS solution. 


## [1] 1.0000000432 1.0000000222 1.0000000012 1.0000000005 1.0000000002 

## [ 6 ] 1.0000000001 1.0000000000 1.0000000000 0.9999993553 0.9989115116 

## [11] 0.9976821885 0.9942484083 0.9825147154 

## 

## [1] 1.0000000000 1.0000000000 0.9234970864 0.9079012130 0.8629365849 

## [ 6 ] 0.8526920031 0.8298036209 0.8145561677 0.7932385763 0.7916517225 

## [11] 0.7864426781 0.7476794757 0.7282682474 
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## 

## [ 1 ] 1.0000000820 1.0000000241 1.0000000047 1.0000000009 1.0000000004 
## [ 6 ] 1.0000000003 1.0000000001 1.0000000001 1.0000000001 1.0000000000 
## [ 11 ] 0.9999999999 0.9999999689 0.9999999005 


We also plot the first two principal components of the thirteen-dimensional FDS solution. Not 
surprisingly, they look most circular and regular for the solution with power three, because 
this actually is the global minimum over two-dimensional solutions. The other configurations 
still have quite a lot of variation in the remaining dimensions. 


power = 1.00 


power = 3.00 


power = 0.33 


CM 
c 
o 
c n 
c 

CD 

E 

TD 



CM 
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c n 
c 
0 
E 
-o 



CM 
c 
o 
c n 
c 
0 
E 



dimension 1 


dimension 1 


dimension 1 


Figure 5: Ekman data, configurations for three powers 

Figure 6 illustrates that the FDS solution with power 3 is quite different from power 1 and 
power one 1/3 Basically the transformations with lower powers result in dissimilarity measures 
that are very similar to Euclidean distances in a high-dimensional configuration, while power 
equal to 3 makes the dissimilarties less Euclidean. This follows from metric transform theory, 
where concave increasing transforms of finite metric spaces tend to be Euclidean. In particular 
the square root transformation of a finite metric space has the Euclidean four-point property, 
and there is a c > 0 such that the metric transform f(t) = ct/( 1 + ct) makes a finite metric 
space Euclidean (Maehara (1986)). 
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power = 1.00 


power = 3.00 


power = 0.33 



Figure 6: Ekman data, fit plots for three powers 


8 Code 


library (MASS) 

torgerson <- function(delta, p = 2) { 
doubleCenter <- function(x) { 
n <- dim(x) [1] 
m <- dim(x) [2] 
s <- sum(x) / (n * m) 
xr <- rowSums(x) / m 
xc <- colSums(x) / n 
return((x - outer(xr, xc, "+")) + s) 

} 

z <- 

eigen(-doubleCenter((as.matrix (delta) 2) / 2) , symmetric = TRUE) 
v <- pmax(z$values, 0) 
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return (z$vectors [, 1 : p] diag(sqrt (v [1 : p]))) 

> 

u <- function (i, n) { 

return (ifelse (i == l:n, 1, 0)) 

> 

e <- function (i, j, n) { 
d <- u (i, n) - u (j , n) 
return (outer (d, d)) 


directSum <- function (x) { 
m <- length (x) 
nr <- sum (sapply (x, nrow)) 
nc <- sum (sapply (x, ncol)) 
z <- matrix (0, nr, nc) 
kr <- 0 
kc <- 0 

for (i in l:m) { 

ir <- nrow (x[[i]]) 
ic <- ncol (x[[i]]) 

z [kr + (l:ir), kc + ( 1 :ic)] <- x [ [i]] 
kr <- kr + ir 
kc <- kc + ic 

} 

return (z) 

} 

repList <- function(x, n) { 
z <- list() 
for (i in l:n) 

z <- c(z, list(x)) 
return(z) 


makeA <- function (n) { 
m <- n * (n - 1) / 2 
a <- list() 
for (j in 1 :(n - 1) ) 
for (i in (j + 1) :n) { 
d <- u (i, n) - u (j , n) 
e <- outer (d, d) 
a <- c(a, list (e)) 
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> 

return (a) 

} 

makeD <- function (a, x) { 

return (sapply (a, function (z) 
sqrt (sum (x * (z %*% x))))) 

> 

makeB <- function (w, delta, d, a) { 
n <- length (a) 
m <- nrow (a[[1]]) 
b <- matrix (0, m , m) 
for (i in l:n) 

b <- b + w[i] * (delta[i] / d[i]) * a[ [i] ] 
return (b) 

} 

makeV <- function (w, a) { 
n <- length (a) 
m <- nrow (a[ [1] ]) 
v <- matrix (0, m, m) 
for (i in l:n) 

v <- v + w[i] * a[[i]] 
return (v) 

} 

inBetween <- function (alpha, beta, x, y, w, delta, a) { 
z <- alpha * x + beta * y 
d <- makeD (a, z) 

return (sum (w * (delta - d) 2)) 

} 

biBase <- function (x, y, a) { 
biBi <- function (x, y, v) { 
all <- sum (x * (v °/„*°/„ x)) 

al2 <- sum (x * ( v %*% y)) 

a22 <- sum (y * ( v %*% y )) 

return (matrix (c(all, al2, al2, a22), 2, 2)) 

} 

return (lapply (a, function (u) biBi (x, y, u))) 

} 

fullMDS <- 


15 


function (delta, 

w = rep (1, length (delta)), 
xini, 
a, 

itmax = 100, 
eps = le-6, 
verbose = TRUE) { 
m <- length (a) 
v <- makeV (w, a) 
vv <- ginv (v) 
xold <- xini 
dold <- makeD (a, xini) 
sold <- sum ((delta - dold) ~ 2) 
bold <- makeB (w, delta, dold, a) 
itel <- 1 
repeat { 

xnew <- vv °/„*°/„ bold %*% xold 
dnew <- makeD (a, xnew) 
bnew <- makeB (w, delta, dnew, a) 
snew <- sum ((delta - dnew) ~ 2) 
if (verbose) { 
cat ( 

formate (itel, width = 4, format = "d"), 
formate ( 
sold, 

digits = 10, 
width = 13, 
format = "f" 

), 

formate ( 
snew, 

digits = 10, 
width = 13, 
format = "f" 

), 

"\n" 

) 

} 

if ((itel == itmax) || (abs(sold - snew) < eps)) 
break 

itel <- itel + 1 
xold <- xnew 
dold <- dnew 
sold <- snew 
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bold <- bnew 


> 

return (list (x = xnew, d = dnew, delta = delta, s = snew, b = bnew, v = v, itel 

} 

9 NEWS 

001 01/21/16 Made a start 

002 01/23/16 Reorganization and further material 
003 01/24/16 First upload. Figures, tables, examples. 

004 01/25/16 Corrected small errors 

005 01/25/16 Reference to metric transform theory 
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